Skip to content

Conversation

@sirtimid
Copy link
Contributor

@sirtimid sirtimid commented Apr 10, 2025

Closes #467

This PR improves vat termination by implementing a comprehensive cleanup process that ensures proper cleanup of all vat resources, reference counting and garbage collection.

Changes

  • Added markVatAsTerminated calls in Kernel.ts for vat termination
  • Implemented cleanupTerminatedVat in vat.ts with proper handling of:
    • Exports (decrements refs, clears reachable flags, adds to GC queue)
    • Imports (decrements refs, clears flags, deletes c-list entries)
    • Promises (decrements refs, removes from c-list)
    • Remaining vat-specific KV entries
  • Fixed cleanup flow by moving deleteEndpoint to end of cleanup process
  • Updated deleteVat to only handle vat config/store deletion, preventing premature cleanup
  • updated refcounts
  • Added one more button Collect Garbage to the control panel for debugging garbage collection
  • update all tests as necessary, add new ones

Integration

  • nextTerminatedVatCleanup runs at start of crank, processing terminated vat queue
  • Garbage collection already runs at end of crank via existing mechanisms

Reference Count Operations

Creation

  • New promises start with refcount=1 when created via initKernelPromise()
  • New objects initialize with reachable=1, recognizable=1 during initKernelObject()

Resolution

  • Promise resolution increments refcounts for all slots in the resolution data
  • When a promise is resolved, all slots in its resolution data get incremented

Imports/Exports

  • Adding c-list entries affects refcounts based on direction (import/export)
  • Exports have different refcount behavior than imports
  • incrementRefCount() increases both reachable and recognizable counts for objects

Garbage Collection

  • When a promise's refcount reaches 0, slots in its resolution data are decremented
  • Orphaned objects are deleted during GC when their refcounts reach 0
  • decrementRefCount() reduces both reachable and recognizable counts
  • When reachable=0 but recognizable>0, object is queued for dropExport
  • When both reachable=0 and recognizable=0, object is queued for deletion

Vat Termination

  • Objects exported by terminated vats are orphaned and queued for GC
  • Imports by terminated vats have their c-list entries deleted as if dropped
  • Promise refcounts are adjusted when cleaning up terminated vats

Special Handling

  • clearReachableFlag() specifically decrements just the reachable count for object imports
  • The reachable/recognizable distinction allows for objects that can be recognized but not used

@sirtimid sirtimid force-pushed the sirtimid/terminated-vats branch from 06d73c3 to 8a7fb12 Compare April 11, 2025 18:40
@sirtimid sirtimid force-pushed the sirtimid/terminated-vats branch from 8a7fb12 to 7cf9c29 Compare April 15, 2025 16:37
@sirtimid sirtimid marked this pull request as ready for review April 15, 2025 19:30
@sirtimid sirtimid requested a review from a team as a code owner April 15, 2025 19:30
@sirtimid sirtimid changed the title feat: Implement proper terminated vat cleanup feat: Implement refcounting and vat termination cleanup Apr 15, 2025
Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of nits to pick but nothing to hold up the train (if you want to make changes for those I promise a quick re-approval if you push an update).

*/
async #run(): Promise<void> {
for await (const item of this.#runQueueItems()) {
this.#kernelStore.nextTerminatedVatCleanup();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an expensive operation to do on every crank, since vat terminations are comparatively rare. Also, if I read the referenced function right, this will only clean up one vat worth regardless of how many vats terminated. What I'd do (not telling you how, just how I'd do it) is keep an in-memory flag that's set by vat termination and cleared by cleanup, have the cleanup be exhaustive, and check for dangling dead vats at kernel startup to handle the case of the kernel having shut down with pending terminations un-cleaned-up after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! This operation isn’t expensive because, as you mentioned, vat terminations are rare. When no vats are terminated, nothing happens. And even if one is, the cleanup only handles one vat at a time.

I followed Agoric’s SwingSet approach here, where they also clean up one vat per crank. That helps avoid putting too much load on the database, especially if there are lots of references. That’s also why we log each cleanup step.

Of course, I don’t know if our kernel will grow as much as Agoric’s, and I’m happy to change this if needed. I just wanted to stay consistent with the SwingSet logic for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the clean up of one vat per crank may postdate my involvement with the SwingSet code then. The thing that struck me as expensive was reading and parsing the terminated vats list every time. But since the list is usually empty and the record is cached then I guess the hit wouldn't be too bad.

* @param eref - The ERef.
*/
function addClistEntry(endpointId: EndpointId, kref: KRef, eref: ERef): void {
function addCListEntry(endpointId: EndpointId, kref: KRef, eref: ERef): void {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clist (as if it was a word) is kinda sort of a term of art, so I'm not sure twiddling with the capitalization like this is best. That said, I wouldn't necessarily expend effort to put it back the way it was unless you feel like it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok! I changed it to CList because that’s how it appears in swingset code. I can definitely revert it though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? My memory must be failing. Yeah, keep CList then. I feel dumb.

@sirtimid sirtimid merged commit e934ac6 into main Apr 16, 2025
21 checks passed
@sirtimid sirtimid deleted the sirtimid/terminated-vats branch April 16, 2025 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle terminated vats

3 participants